-[ BFi - English version ]---------------------------------------------------- BFi is an e-zine written by the Italian hacker community. Full source code and original Italian version are available at: http://bfi.s0ftpj.org/dev/BFi12-dev-10.tar.gz English version translated by Tanith , nail, Raist_XOL and Zen. ------------------------------------------------------------------------------ ============================================================================== --------------------[ BFi12-dev - file 10 - 14/03/2004 ]---------------------- ============================================================================== -[ DiSCLAiMER ]--------------------------------------------------------------- The whole stuff contained in BFi has informative and educational purposes only. In no event the authors could be considered liable for damages caused to people or things due to the use of code, programs, pieces of information, techniques published on the e-zine. BFi is a free and autonomous way of expression; we, the authors, are as free to write BFi as you are free to go on reading or to stop doing it right now. Therefore, if you think you could be harmed by the topics covered and/or by the way they are in, * stop reading immediately and remove these files from your computer * . You, the reader, will keep to youself all the responsabilities about the use you will do of the information published on BFi by going on. You are not allowed to post BFi to the newsgroups and to spread *parts* of the magazine: please distribute BFi in its original and complete form. ------------------------------------------------------------------------------ -[ HACKiNG ]------------------------------------------------------------------ ---[ STEGAN0GRAPHY APPLIED 0N NETW0RK SESSi0NS AND NEiGHB0URH00D -----[ vecna Ah, what a nice stuff steganography. If you want to clarify your ideas, you can read http://citeseer.nj.nec.com/fridrich02practical.html Unfortunately, my experience as a teacher is not helping me when I write an article, maybe I count on too many things, thus, in order to help you understanding, I suggest you to read links when there are: so, reading the article will keep you busy for several days... but eventually it will be worth it. :) 1) STATE OF THE ART Usually steganography is considered to be applied to image files. However, whoever is a bit more interested in file system, whoever has understood its essence can apply it wherever. Anyway, I will leave the most tasty frenzies in the third part of the article. Applying steganography to net sessions is not much different than covert channels, the only difference is that they are usually thought of as hidden communication systems for a net protection system. If I have a firewall allowing me outgoing only through port 80, with an http tunnel (http://www.nocrew.org/software/httptunnel.html, or even http://openvpn.sf.net) I'll be able no matter how, by relaying myself on a machine interpreting my tunnel, to incapsulate every kind of traffic within that channel and to do things I otherwise couldn't, included reaching machines within the protected network. About covert channels, SANS staff in 2000 mentioned a fusys work and a one of mine in this analysis http://www.s0ftpj.org/docs/covert_shells.htm . As we said before, network traffic steganography can be considered a specialized type of "covert communication channel". Network sessions steganography means "to put data within packets that appears to be empty"; on the contrary, covert channels usually means "to put arbitrary data within packets usually containing other kind of data, so that they are accepted since perfunctorily considered as related to what they appears for". The first steganography project applied to networking is this one: http://public.lanl.gov/cdi/networkstenganography.htm . You'll find here a document related to steganography applied to networking and technique tested below is mentioned there, too. Later on, you'll understand why I talk about it. In order to be able to apply steganography to TCP sessions or IP packets in general, we need to find a place where to put data. Usually is taken header which is to be abused, and, knowing each field meaning since you've previously read rfc791 (http://www.faqs.org/rfcs/rfc791.html), you look for fields which can host data, though taking care of packet to be valid and so to reach its destination. IP Header : 0 4 8 16 19 24 32 ------------------------------------------------------------------------ | VERS | HLEN | Service Type | Total Length | ------------------------------------------------------------------------ | Identification | Flags | Fragment Offset | ------------------------------------------------------------------------ | Source IP Address | ------------------------------------------------------------------------ | Destination IP Address | ------------------------------------------------------------------------ | IP Options | Padding | ------------------------------------------------------------------------ | Data | ------------------------------------------------------------------------ All of these fields have an aim and we can't hope to find an empty space (except for padding, which should be filled with zeroes according to the RFC...): we have to find out how these fields are filled by out TCP/IP stack and according to that, we have to see if we can manipulate them to insert data without subverting the normal operations of protocols. However, if IP leaves a very little space to act, TCP offers more space placing many more fields linked to control session at your disposal. They can be used steganographically if there's no session to be controlled: 0 4 8 16 19 24 32 ------------------------------------------------------------------------- | Source Port | Destination Port | ------------------------------------------------------------------------- | Sequence Number | ------------------------------------------------------------------------- | Acknowledgment Number | ------------------------------------------------------------------------- | HLEN | Reserved | Code Bits | Window | ------------------------------------------------------------------------- | Checksum | Urgent Pointer | ------------------------------------------------------------------------- | Options | Padding | ------------------------------------------------------------------------- | Data | ------------------------------------------------------------------------- To clarify ideas about TCP header fields meaning, I refer to http://www.faqs.org/rfcs/rfc793.html . The few softwares implemented, however, had an obvious problem: since they hid data within unlikely fields or in unusual ways, they weren't at all resistent to steganalysis. Steganalysis is process by which you can understand there has been an exchange of information through a non-conventional way: it's not really an analysis technique that can be described as you can do with cryptanalysis, it's rather understanding what's too evidently "out of standard" and, after identifying standard changing, trying to check if it's a false positive due to errors or if error, because of its frequency, is actually due to an user controlling it to hide something. A tool, which was introduced in "Covert Channels in the TCP/IP Protocol Suite" generates traffic such as this: 18:50:13.551117 nemesis.psionic.com.7180 > blast.psionic.com.www: S 537657344:537657344(0) win 512 (ttl 64, id 18432) 18:50:14.551117 nemesis.psionic.com.51727 > blast.psionic.com.www: S 1393295360:1393295360(0) win 512 (ttl 64, id 17664) 18:50:15.551117 nemesis.psionic.com.9473 > blast.psionic.com.www: S 3994419200:3994419200(0) win 512 (ttl 64, id 19456) 18:50:16.551117 nemesis.psionic.com.56855 > blast.psionic.com.www: S 3676635136:3676635136(0) win 512 (ttl 64, id 19456) 18:50:17.551117 nemesis.psionic.com.1280 > blast.psionic.com.www: S 774242304:774242304(0) win 512 (ttl 64, id 20224) 18:50:18.551117 nemesis.psionic.com.21004 > blast.psionic.com.www: S 3843751936:3843751936(0) win 512 (ttl 64, id 2560) which is necessary to give an example of steganography within ID IP. In this case, to make sense packet SYN flag is set within TCP header, pretending source host to be connecting to remote server. Analyzing it, however, you can observe that no TCP/IP stack ever generates, second by second, packets towards the same port through a source port changing randomly. Supposing that these elements are corrected, so that traffic is more similar to a connection attempt on a port 80 not responding, it would always be unusual traffic the one created to transfer even only 1500 bytes (1500 SYN packets spaced out by a second, towards the same port, makes no sense; if they were towards different ports it could seem a scan, if it was swifter it could seem a flood, but this is too strange to ignore it twice or more). It is quite easy to decide to sacrifice some fields, several fields... ports, sequence number, acknowledgment number, urgent pointer, ip identifier, being able to transmit many more bytes in a single packet, but this would mean to subvert completely the normal operations of protocols, making it a covert channel rather than a steganographic system. What I showed cannot at all be the right way. The right way, if you want to apply steganography to network or transport layers, is to look at a common connection and to try to understand how data can be inserted within it, while making sure that an external observer will not be able to detect anything strange. This is a real session managed by a TCP/IP stack and by the lynx web browser and this is how our steganographed session will appear at the end: APPARENTLY the same. 13:28:07.500468 192.168.1.69.58067 > 66.102.11.104.80: SWE [tcp sum ok] 158029937:158029937(0) win 5840 (DF) (ttl 64, id 17888, len 60) 13:28:07.598985 66.102.11.104.80 > 192.168.1.69.58067: S [tcp sum ok] 2710819308:2710819308(0) ack 158029938 win 8190 (ttl 244, id 1970, len 44) 13:28:07.599064 192.168.1.69.58067 > 66.102.11.104.80: . [tcp sum ok] 1:1(0) ack 1 win 5840 (DF) (ttl 64, id 17889, len 40) 13:28:07.603015 192.168.1.69.58067 > 66.102.11.104.80: . 1:1413(1412) ack 1 win 5840 (DF) (ttl 64, id 17890, len 1452) 13:28:07.603042 192.168.1.69.58067 > 66.102.11.104.80: P 1413:2312(899) ack 1 win 5840 (DF) (ttl 64, id 17891, len 939) 13:28:07.863177 66.102.11.104.80 > 192.168.1.69.58067: . 1:1413(1412) ack 2312 win 32476 [tos 0x10] (ttl 53, id 2145, len 1452) 13:28:07.863268 192.168.1.69.58067 > 66.102.11.104.80: . [tcp sum ok] 2312:2312(0) ack 1413 win 8472 (DF) (ttl 64, id 17892, len 40) 13:28:07.864275 66.102.11.104.80 > 192.168.1.69.58067: P 1413:1573(160) ack 2312 win 32476 [tos 0x10] (ttl 53, id 2146, len 200) 13:28:07.864321 192.168.1.69.58067 > 66.102.11.104.80: . [tcp sum ok] 2312:2312(0) ack 1573 win 11296 (DF) (ttl 64, id 17893, len 40) 13:28:07.877845 66.102.11.104.80 > 192.168.1.69.58067: P 1573:2621(1048) ack 2312 win 32476 [tos 0x10] (ttl 53, id 2159, len 1088) 13:28:07.877911 192.168.1.69.58067 > 66.102.11.104.80: . [tcp sum ok] 2312:2312(0) ack 2621 win 14120 (DF) (ttl 64, id 17894, len 40) 13:28:07.887977 66.102.11.104.80 > 192.168.1.69.58067: FP 2621:3417(796) ack 2312 win 32476 [tos 0x10] (ttl 53, id 27843, len 836) 13:28:07.920812 192.168.1.69.58067 > 66.102.11.104.80: . [tcp sum ok] 2312:2312(0) ack 3418 win 16944 (DF) (ttl 64, id 17895, len 40) 13:28:11.954544 192.168.1.69.58067 > 66.102.11.104.80: F [tcp sum ok] 2312:2312(0) ack 3418 win 16944 (DF) (ttl 64, id 17896, len 40) 13:28:12.050846 66.102.11.104.80 > 192.168.1.69.58067: . [tcp sum ok] 3418:3418(0) ack 2313 win 32476 (ttl 244, id 232, len 40) Timing between packages, CWND and options management, actual data transfer and checksums are included in every connection, so the space we can use is really small: every field we could have used to put data in has an RCF-imposed value, which the TCP/IP stacks will enforce. Anyway, you have to look for before finding, and if we analyze everything elapsing in those short exchanges, we see which fields are involved... Untouchable things for a session to run properly are: IP header: addresses, fragmentation, flags, checksums, lens, version and IHL. TCP header: sequences and acks, flags, incoming port, checksums. DATA: they must be the same for all connections, we can't discriminate them depending on a certain source IP sending different pages. At the same time, during years it has been observed some operating systems implementing their TCP/IP stacks in different ways, so different that by analyzing certain combinations of fields operating systems can be fingerprinted remotely, either by active or passive scanners: these differences are worth studying, though understanding a port or a sequence number generating nature won't likely show space to hide data within. The most useful thing I saw is about OpenBSD and GRsecurty: they introduce randomization of some incremental stuff for security purposes. A process pid, for example, is by default random on OpensBSD and selectable through sysctl on FreeBSD, while it is incremental on Linux. Even some networking elements draw on this randomization fixation: starting sequence number (it has already been randomized to avoid ip spoofing, after statistic analyses introduced in http://razor.bindview.com/publish/papers/tcpseq.html ; well, generation algorithms have been revised as you can see in http://lcamtuf.coredump.cx/newtcp/), source port used to start a connection (or hidden by NAT), IP ID (identifying packet, it's used if a packet is fragged: fragments have the same identifier). The last two values are usually incremental, they are both 16 bit fields. Once source port for session has been chosen it won't change, while IP ID always will. In my opinion, this is the point to be used in order to apply steganography to a session, pretending it to be a trivial standard session, so that it cannot be noticed at all. Since theoretically steganography cannot be found out and message holder cannot be discriminated from other similar holders, we have often to plan stegosystem only relying on a few available elements to be able to transmit securely. Actually 2 bytes per packet are really few, so compression is necessary (though ciphering is better and it's free). By compressing or encrypting data, we reduce the chance to generate sequences of repeating identifiers, and avoid that an analyst detects a nonrandom sequence in what is supposed to be a random stream. Usually the ID is incremental, i.e. each packet generated has its own ID and the following packet has ID+1. So identifiers are spread across a number of different sessions, and each individual session will not have sequential IDs, but seemingly random ones (albeit always increasing). OpenBSD and GRsecurity introduced random IDs (OpenBSD by default, while GRsec and FreeBSD need it to be enabled) thus, the appearance of random IDs could be explained by the use of these operating systems and patches, and not be a telltale of steganographic activity going on. We developed a test framework, named "innova" (http://www.s0ftpj.org/projects/innova/), which allows us to manipulate or analyze packets of our own sessions in a way which is transparent to kernel and userspace applications. As it is well known, as the application demands the creation of a socket, the kernel actually creates it. Afterwards, as the application requests the connection of that socket, the kernel performs the three-way handshake on behalf of the application. >From this point on, the application sends data through the socket, and relays construction of packets, transmission, and acknowledgments to the kernel. Innova intercepts outgoing packets after the kernel has finished going through them, and can thus manipulate and study what happens to TCP/IP stack in a transparent and independent way. It works in userspace, and not in kernel mode (although a minimal one will soon be implemented), and analysis/manipulation options are managed by plugins to allow the framework to be more flexible. Since a likely application is transparent steganography, we implemented a demo plugin to realize it. <-| stego/ip_steganography.c |-> /* * innova plugin, coded by Claudio Agosti vecna@s0ftpj.org * * Mon Oct 13 22:08:31 2003, finished Thu Jan 29 18:35:13 2004 * (I like lost my time) * * ip_steganography work to implement transparent steganography * on ip packets, using ip id field. * * some operating system and other with special patch implement * random ip id generation. this can be a nice way to * hide data inside connection, using innova plugin this can * be applied transparently with your common client sessions. * * http://www.gnupg.org * http://www.gzip.org * http://mcrypt.sf.net * * these software should be used to crypt and compress file before * sending with this plugin. the options passed to this plugin must * be "/dump_directory /input_directory" * the dump_directory could be filled with ip.ip.ip.ip-port dump * of incoming session, on the input_directory it looks for the * file with ip.ip.ip.ip try to connect, if founded, is used as * input to take the couple of byte to put on ip->id field. * * (http://www.s0ftpj.org :) for better information you should * search information on http://claudio.itapac.net */ #include #include #include #include #include #include #include #include #include #include #include #include "innova.h" static char *plug_desc="ip steganography"; static char *dump_path, *source_path; #define MAXSEQ 4 struct stgcouple_track { unsigned int addr, counter; unsigned short port; FILE *dump; char *fname; unsigned int last_seq[MAXSEQ]; }; /* maximun of incoming connection tracked */ #define MAXTRAKSEX 20 static struct stgcouple_track incoming_track[MAXTRAKSEX]; static struct stgcouple_track outgoing_track[MAXTRAKSEX]; #define INCOMING 1 #define OUTGOING 2 char *get_io_desc(int *plugin_version) { *plugin_version =PLUGIN_FORMAT; return plug_desc; } int mangle_init(struct innova_struct *is) { /* * return < 0 is error, and innova is break, * return 0 is for repeat mangle_init, * return > 0 is for init success, innova continue happy */ return 1; } int mangle_cleanup(struct innova_struct *is, int error) { int i; printf("forced closing...\n"); for(i =0; i < MAXTRAKSEX; i++) { if(incoming_track[i].dump !=NULL) { printf("incoming file %s, %d packet\n", incoming_track[i].fname, incoming_track[i].counter ); fclose(incoming_track[i].dump); incoming_track[i].dump =NULL; } } for(i =0; i < MAXTRAKSEX; i++) { if(outgoing_track[i].dump && !feof(outgoing_track[i].dump)) { printf("outgoing opened session %s, " "ending without be finish (%d byte sent)\n", outgoing_track[i].fname, outgoing_track[i].counter ); fclose(outgoing_track[i].dump); incoming_track[i].dump =NULL; } } return 1; } static inline struct stgcouple_track * get_session(int who, struct innova_packet *pkt) { unsigned int i; struct stgcouple_track *list; if(who ==INCOMING) list =incoming_track; else list =outgoing_track; for(i =0; i < MAXTRAKSEX; i++) { if(who ==INCOMING) if(pkt->ip->saddr ==list[i].addr && pkt->tcp->source ==list[i].port) return &list[i]; if(who ==OUTGOING) if(pkt->ip->daddr ==list[i].addr && pkt->tcp->dest ==list[i].port) return &list[i]; } return NULL; } static inline struct stgcouple_track * get_next_free(int who) { unsigned int i; struct stgcouple_track *list; if(who ==INCOMING) list =incoming_track; else list =outgoing_track; for(i =0; i < MAXTRAKSEX; i++) if(list[i].dump ==NULL) return &list[i]; return NULL; } #define DUPSEQ 0 #define NEWSEQ 1 /* is used xor because seq and ack_seq should not change both */ static inline int check_seq(unsigned int *seqlist, unsigned int last_xor) { unsigned int i; for(i =0; i < MAXSEQ; i++) { if(seqlist[i] ==last_xor) return DUPSEQ; } for(i =(MAXSEQ -1); i > 0; i--) seqlist[i] =seqlist[i -1]; seqlist[0] =last_xor; return NEWSEQ; } int local_mangle(struct innova_struct *is, struct innova_packet *pkt) { struct stgcouple_track *tracking =get_session(OUTGOING, pkt); if(tracking ==NULL && pkt->tcp->syn) { char fname[MAXPATHLEN], i; if((tracking =get_next_free(OUTGOING)) ==NULL) { printf("outgoing session overcoming limit of %d\n", MAXTRAKSEX); return PACKET_OK; } sprintf(fname, "%s/%s", source_path, inet_ntoa(*(struct in_addr *)&pkt->ip->daddr)); if((tracking->dump =fopen(fname, "r")) ==NULL) return PACKET_OK; printf("opened [%s] file for be sent\n", fname); tracking->fname =strdup(fname); tracking->addr =pkt->ip->daddr; tracking->port =pkt->tcp->dest; for(i =0; i < MAXSEQ; i++) tracking->last_seq[(int)i] =0x00; } if(tracking ==NULL) return PACKET_OK; if(pkt->tcp->fin || pkt->tcp->rst) { printf("closing session [%s] packets sent %d\n", tracking->fname, tracking->counter); if(!feof(tracking->dump)) { printf("ERROR! file %s not totally sent, " "%d byte only\n", tracking->fname, tracking->counter *2 ); } fclose(tracking->dump); tracking->dump =NULL; tracking->addr = tracking->port =0; return PACKET_OK; } /* to avoid duplicated packet */ if((check_seq(tracking->last_seq, pkt->tcp->ack_seq ^ pkt->tcp->seq)) ==DUPSEQ) return PACKET_OK; if(feof(tracking->dump)) { printf("session %s sent after %d byte\n", tracking->fname, tracking->counter ); fclose(tracking->dump); tracking->dump =NULL; tracking->addr = tracking->port =0; } else { fread(&pkt->ip->id, 1, sizeof(unsigned short), tracking->dump); tracking->counter++; } return PACKET_OK; } int remote_mangle(struct innova_struct *is, struct innova_packet *pkt) { struct stgcouple_track *tracking =get_session(INCOMING, pkt); if(tracking==NULL && pkt->tcp->syn) { char fname[MAXPATHLEN], i; if((tracking =get_next_free(INCOMING)) ==NULL) { printf("ingoing session overcoming limit of %d\n", MAXTRAKSEX); return PACKET_OK; } /* * YES, THIS IS A BUFFER OVERFLOW!, * root could became root. */ sprintf(fname, "%s/%s-%d", dump_path, inet_ntoa(*(struct in_addr *)&pkt->ip->saddr), htons(pkt->tcp->source) ); if((tracking->dump =fopen(fname, "w+")) ==NULL) { printf("unable to open file %s!!\n", fname); return PACKET_OK; } tracking->fname =strdup(fname); tracking->addr =pkt->ip->saddr; tracking->port =pkt->tcp->source; for(i =0; i < MAXSEQ; i++) tracking->last_seq[(int)i] =0x00; tracking->counter =0; } /* untracked session */ if(tracking ==NULL) return PACKET_OK; /* to avoid duplicated packet */ if((check_seq(tracking->last_seq, pkt->tcp->ack_seq ^ pkt->tcp->seq)) ==DUPSEQ) return PACKET_OK; /* * sorry for stressing your scheduler calling a lot of system call, * but this is only a demonstration... static connection table, * check only the last sequence number, and other non-performantic * things :) */ if(pkt->tcp->fin || pkt->tcp->rst) { printf("closing session [%s] packets received %d\n", tracking->fname, tracking->counter); fclose(tracking->dump); tracking->dump =NULL; tracking->addr = tracking->port =0; } else { fwrite(&pkt->ip->id, 1, sizeof(short), tracking->dump); tracking->counter++; } return PACKET_OK; } int io_timeexceed(struct innova_struct *is, int *timeout) { return 0x00; } void stegano_print_help(void) { fprintf(stderr, "steganography plugin simple required two path, one for " "incoming sessions dump\nthe other for search data to put" " inside the packets\nfirst: dump, eg /tmp\nsecond: input," " eg /encrypted/ (with.any.ip.file) like " "/encrypted/192.168.0.1\n" ); } static int check_path(char *path) { struct stat st; if(path ==NULL) return 0; if((stat(path, &st)) ==-1) return 0; return S_ISDIR(st.st_mode); } int option_parser(struct innova_struct *is, struct innova_options *iopt) { if(iopt->plug_opt ==NULL || !strcmp(iopt->plug_opt, "help")) { stegano_print_help(); return 0xff; } dump_path =iopt->argv[0]; source_path =iopt->argv[1]; if(!check_path(dump_path) || !check_path(source_path)) { stegano_print_help(); innova_p(FATAL, "invalid path passed as option"); } return 0x00; } <-X-> How does it work? For the generic documentation of innova, we refer you to the complete source code and documentation which can be downloaded as a package from http://www.s0ftpj.org/projects/innova/ The framework is in its early release phase, so it is expected to show problems and faults. Bug reports and similars wille be gratefully accepted. Let's look at a simple possible use of the steganography plugin in a case study. Server: Let us suppose that we want to use a web server to communicate data in a hidden way to a client, and let us suppose, just for simplicity, that it already knows the client IP (plugin is just an example, it needs to know formerly ip, though it can be expanded in a quite simple way; as you can see from plugin, functions framework looks for are the most intuitive I find for a traffic manipulation system. Our file will be: gw@/tmp/innova-0.0.1# man strtoul > secret gw@/tmp/innova-0.0.1# md5sum secret ad7b9c8997544c7f4188869457c42118 secret gw@/tmp/innova-0.0.1# gzip -9 secret gw@/tmp/innova-0.0.1# ls -l secret.gz -rw-r--r-- 1 root root 1299 Feb 4 00:20 secret.gz gw@/tmp/innova-0.0.1# mv secret.gz /tmp/192.168.1.69 gw@/tmp/innova-0.0.1# ./innova -p tcp -l 80 -o "/tmp /tmp" \ -m plugins/ip_steganography -i eth1 192.168.1.69 innova start init of plugin /tmp/innova-0.0.1/plugins/ip_steganography.so: ip steganography Client: Client wants to download a file from a web server: to do that, it runs wget together with URL and downloads it. Actually, it's not interested on file on its own, but it wants to establish a session with remote server and to send it a steganographic message. Message is: schlafen@/tmp$ cat secret biscotti con frolla al parmiggiano: 500 grammi di farina, 350 di burro, un tuorlo e un uovo intero, 190 grammi di zucchero (frolla normale) o 190 di PARMIGGIANO (frolla al parmiggiano) cuocere a 180 gradi per 20 minuti circa. sembra strano, sembra fusion, ma sono buoni. antani sia con te e benedica la tua via steganografica, fratello cuoco. amen! schlafen@/tmp$ md5sum secret 9eaf187b268f8ab67888178ab381534d secret schlafen@/tmp$ gzip -9 secret -c > 192.168.1.1 schlafen@/home/vecna/wrk/innova-0.0.1# ./innova -p tcp -r 80 \ -m plugins/ip_steganography -o "/tmp /tmp/" -i eth0 192.168.1.69 innova start init of plugin /root/wrk/innova-0.0.1/plugins/ip_steganography.so: ip steganography Once innova starts, client starts too, and it will perform its net session that is a cover, like a normal download session: schlafen@~$ wget http://192.168.1.1/film/southpark-matrix.mpg --01:26:57-- http://192.168.1.1/film/southpark-matrix.mpg => `southpark-matrix.mpg' Connecting to 192.168.1.1:80... connected. HTTP request sent, awaiting response... 200 OK Length: 25,050,957 [video/mpeg] 100%[====================================>] 25,050,957 3.88M/s ETA 00:00 01:27:02 (4.33 MB/s) - `southpark-matrix.mpg' saved [25050957/25050957] innova intercepts outgoing connection, it checks if within /tmp there's a file having the same name of the one of IP we are checking, and, as secret.gz file has been renamed 192.168.1.1 , innova opens it and uses it as source. All packets belonging to a session towards 192.168.1.1 will have associated file in ip->id . Application performs connect, kernel performs three way handshake. Then there are also packets client receives. When innova is running, each packet matching innova rules (in this case, -r 80 denotes remote gate and 192.168.1.1 host) is managed by plugin. For each session, plugin opens within dump directory a file named /directory/ip-porta, in which received packets ID will be written. session /tmp/192.168.1.1 sent after 130 byte closing session [/tmp/192.168.1.1-80] packets received 17302 forced quit for signal: 2 forced closing... Terminated schlafen@/tmp# file 192.168.1.1-80 192.168.1.1-80: gzip compressed data, was "secret", from Unix, max compression schlafen@/tmp# mv 192.168.1.1-80 /tmp/secret.gz schlafen@/tmp# gzip -d secret.gz gzip: secret.gz: decompression OK, trailing garbage ignored schlafen@/tmp# file secret secret: ASCII English text, with overstriking schlafen@/tmp# md5sum secret ad7b9c8997544c7f4188869457c42118 secret Server: Server has a running http service, when it receives a connection it will dump ip->id series, knowing one of them to be likely to contain a steganographic session (at first they're all dumped, later on each dump is analyzed, since in real conditions they should be ciphered, not gzipped). What is more, seeing an outgoing session from analyzed service, it checks if there's a file having ip of client which contacted it. If there is, it's used as source file: opened [/tmp//192.168.1.69] file for be sent session /tmp//192.168.1.69 sent after 650 byte closing session [/tmp//192.168.1.69-36227] packets received 5254 Terminated gw@/tmp/innova-0.0.1# file /tmp/192.168.1.69-36227 gw@/tmp/192.168.1.69-36227: gzip compressed data, deflated, original filename, last modified: Wed Feb 4 00:29:38 2004, max compression, os: Unix gw@/tmp/innova-0.0.1# cd .. gw@/tmp# mv 192.168.1.69-36227 received.gz gw@/tmp# gzip -d received.gz gzip: received.gz: decompression OK, trailing garbage ignored gw@/tmp# md5sum received 9eaf187b268f8ab67888178ab381534d received Thus, mpeg downloading has turned to a bidirectional channel for exchanging hidden data, by downloading the same file everybody downloads, with no modifies. 2) STEGANALYSIS APPLIED TO REAL TRAFFIC Then analysis is restricted, to understand if "a session managed by a random ID series is so uncommon to create suspicion or if it's covered by average traffic". So, asking here and there, I got enough traffic to analyze it decently: for such analysis, it is just necessary to study ip ID increase for each host. I used several scripts and code fragments written off, I'm not going to mention them. However, from tcpdump lines such as: 74.19.61.75 > 131.192.48.6: [|tcp] (DF) (ttl 46, id 21959, len 92) 74.19.61.79 > 131.192.48.6: [|tcp] (DF) (ttl 45, id 45841, len 65) I passed to: 74.19.61.75 131.192.48.6 21959 74.19.61.79 131.192.48.6 45841 to end with an array of IDs for each source IP while keeping trace of all ID increases greater than 1 (standard increase). When acquisition is complete, I compare increase sum and packets number (thus finding average increase for each packet). Now, average increase (included among 1 and 40000) has been forced to be rounded off and proportioned to a 100 items array, to see distribution and to understand if average increase is most at all high or low. Surely, this is not the better way to describe that. Though, if a session shows an extremely high increase it will arrive above the last value of this analysis referred to "normal" sessions; if normal sessions are marked off by a large scale, it is more difficult for an ip id session used as a steganographic container to be recognized. This is array dump of ratios proportioned to 100 items: 0) 955 10) 38 20) 24 30) 11 40) 6 50) 1 60) 0 70) 0 80) 0 90) 0 1) 114 11) 47 21) 22 31) 18 41) 1 51) 2 61) 0 71) 0 81) 0 91) 0 2) 99 12) 18 22) 28 32) 6 42) 9 52) 3 62) 0 72) 1 82) 0 92) 0 3) 60 13) 39 23) 28 33) 13 43) 2 53) 1 63) 0 73) 0 83) 0 93) 0 4) 62 14) 33 24) 18 34) 4 44) 5 54) 2 64) 0 74) 0 84) 0 94) 0 5) 71 15) 23 25) 12 35) 9 45) 7 55) 2 65) 0 75) 0 85) 0 95) 0 6) 53 16) 21 26) 27 36) 6 46) 2 56) 1 66) 0 76) 0 86) 0 96) 0 7) 47 17) 29 27) 12 37) 4 47) 5 57) 0 67) 0 77) 0 87) 0 97) 0 8) 42 18) 25 28) 15 38) 6 48) 6 58) 0 68) 0 78) 0 88) 0 98) 0 9) 31 19) 22 29) 16 39) 5 49) 4 59) 0 69) 0 79) 0 89) 0 99) 1 By analyzing exchange session (to compare them, I considered data exchange to happen only from 192.168.1.1 to 192.168.1.69, not the opposite, in order to study steganographic session related to a normal one) and I remarked average increase to be about 19000 units. This analysis appears to be valid, though, thinking again to steganography running on IP ID and knowing there can be a highly random transmission only when session begins, followed by a fall, I tried to divide sessions in 4 groups, depending on the number of packets they contains, and to generate a graphic taking as X packet number and as Y ip id value. We can thus see which sessions have an increasing trend and which one are random (the second ones will be scattered on cartesian axes, the first ones trend will be increasing till it reaches a maximum point when unsigned short describing them overflows and counting starts again). At last, some .png graphs attached to this article (./graph/) have been published: there are traffic graphs[1-7].png generated by gnuplot, showing some examples of normal traffic; steganotraffic.png showing traffic generated by innova session. It's easy to understand, but I also remarked some machines included into analysis (are they OpenBSD or linux + GRsec?) to have random ip id, so I still suggest, if you want to use this technique, to use a system such as the ones generating random id, so that a possible statistic analysis can't remark any difference. 3) LET'S START MENTAL ILLNESS While thinking about the last speech during e-privacy 2003, dealing with steganography, I remembered naif during Q&A session to ask me a question I don't remember how I answered to at first, yet now I think I have final (I hope) answer. Question: "Is it possible to create steganographic techniques resistant to steganalysis, not depending on related specifications to be public? For example, while implementing steganography within a mail client, when it's spread and built-in and specifications are public, is it really possible to steganography e-mail exchanges or whatever while avoiding steganalysis?" (http://www.s0ftpj.org/docs/ep2003_steganografia_vecna.ogg) When does steganalysis work? It works when it is able to understand a datum, whichever it is, has been modified in order to contain information. If you don't lack imagination, you can understand it is easy to find out how to hide data: almost every container can do it, more or less performingly. Steganography can be easily imagined to be applied to images: being able to replace with arbitrary data those details human eye can't notice, apparently image doesn't appears to have changed, but it hides data which can be retrieved by anyone knowing how to decode them. Images are fit for it, a green meadow photo is unlikely to remark less important bytes of grass aren't the real ones. Similarly, steganographing a mp3 by a heavy metal band rather than a solo pianist, probably hides our changes, no matter how invading they are. It's normal to think about using other multimedia containers: they're fit for, as they usually support definitions human senses can't completely appreciate, so parts considered "too accurate" "unnecessary", and which can have every kind of data since they're acquired from external devices liable to errors and noises, can be replaced with arbitrary data. Anyway, multimedia objects aren't the only ones fit for us: ALL KIND OF DATA, one way or another, can be steganographic containers. Most at all, what differentiate them is capacity (if an image made of X bytes contains Y data, while another X bytes file contains Y/10 data... we'll choose the image, since we use to think about steganography applied to less important bits. Each datum we can generate can be used as a steganographic container: -EACH FILE-, -EACH DATUM-, can be used, even though final contents are the same. It is necessary to consider it another way, with regard to information encoding. The only limit is imagination; here there are some examples to address it :) Steganography can be applied to whichever transmission system. Let's imagine having a HTML file: we take each byte and we bold it or we don't. We'll eventually have a text file appearing to be modified by someone wasting his time with tags. Who is able to interprete that file for, however, a bold char means 1, a normal one means 0; by interpreting sequence through a 8 byte text rating (bold/non-bold) = a hidden byte will be extracted from file/message. If this stegosystem is quite strange and can be easily found out, can't be used very much and whatever, yet is an example to say "if we can generate files and we can move within our options as common users, then, depending whether these options are used or they aren't, we can hide data, without modifying final contents". HTML is fit for next example, that actually can be applied to every word processiong format... Let's imagine to create a document containing some tabs. This key just shifts cursor to the following byte being divisible by 8. If we are at the top of the document, 0 position, and we hit tabs...we arrive to 7. If we are in 1 position...we arrive to 7. :) If, while in 0 position, we set "times new roman" as font, and we hit spacebar, in the second one we set "courier" and we space, we tab and we set font we are using to write document, we are in 7 position with font we would have had a way or another. However, there are 2 more bytes, they can't be seen, but they are there. Why not using them to hide data? Each tab allows us disposing up to 7 bytes, which we can fill with space. As our word processor has 80 different fonts, we can have 80^7 possible combinations (like a byte is 2^8, since I can use 8 combinations for 2 statuses) we'll have by these elements 2.097152 * 10^13, that is equivalent to combinations which can be expressed by 45 bit, 5 complete bytes expressed depending on modifying font is quite good :) If we use an automatic system to insert/extract information from such documents, no matter if html, pdf, ps, apparently we didn't change anything, though we succeeded our purpose. Very nice, BUT: If each stegosystem is studied and attacked, and a system aiming to find it out is created, then almost all stegosystem would be vulnerable. Why? If we analyze statistically every page and we study use frequency and distance between a tag and another one, we take a large number of pages from Google and we statistic it to define a "model", a value or a value series describing average use... then we could just use this referring to analyze all pages and find out if something is wrong: some results are false positives, but other ones could not. Same for document with odd fonts. If it's analyzed statistically we would fail. Our document would be the only one having that peculiarity, invisible but important for an automatic analysis because it's really uncommon. Of course, stegosystem security must be tested by attacking it, to understand how much can a system be attackable it must be studied and understood, that's way we deal with simple things. :) jpeg DCT steganography is more complex, not necessarily the most scure or performing one, and since is not immediate to think how to attack it, is even more difficult than with other systems. SNOW (steganographic nature of white-space: http://www.darkside.com.au/snow) is a software steganographing by spacing (" ":) the end of lines. At first analysis, this technique appears to be criticizable, since within a document usually there's no reason for willingly inserting spaces at the end of lines, so a document showing....................... The nicest thing about human randomness is it's not forecastable. ~/txt/zine$ find . -name 'BF*' -exec file {} \; | grep ASCII | \ awk {'print $1'} | sed -es/:// > /tmp/bfitxtfiles ~/txt/zine$ for i in `cat /tmp/bfitxtfiles`; \ do x=`grep -c " $" $i` && y=`wc -l $i | awk {'print $1'}` \ && echo "$(($y / $x))" ; done | sort -g | column 1 7 11 15 20 22 25 32 38 158 2 9 11 16 20 22 27 33 45 167 2 10 11 16 20 22 27 33 49 3 10 12 16 21 23 28 33 51 5 10 12 16 21 23 28 33 59 5 10 12 16 21 23 28 34 62 6 10 13 17 21 24 29 34 84 7 10 14 18 22 25 29 36 117 7 11 15 18 22 25 30 37 118 Let's do the same for phrack :) ~/txt/zine$ for i in `cat /tmp/phracktxtfiles`; \ do x=`grep -c " $" $i` && y=`wc -l $i | awk {'print $1'}` \ && echo "$(($y / $x))" ; done | sort -g | column 1 4 6 9 15 22 32 53 85 220 1 4 7 12 16 22 32 54 90 235 1 4 7 12 17 24 34 54 90 237 1 4 7 13 18 26 36 66 95 342 1 4 7 13 18 26 37 73 126 350 2 4 8 13 19 26 38 75 127 466 2 5 8 13 19 28 40 76 128 494 2 5 8 14 19 28 43 80 136 603 3 5 8 15 20 30 43 83 141 660 3 5 8 15 21 30 44 84 179 665 3 5 9 15 21 31 45 84 214 1261 3 6 9 15 22 31 48 84 219 (Unix is really powerful) The preceding list of numbers is the relationship between total_file_line_number / lines_ending_in_space I thought I was to find much higher numbers, while it's not unusual to find space characters at the end of a text file. Given this, SNOW can be considered a fairly strong stegosystem, as what it does is done by the user, too. We have no other way to understand if a space was added during text formatting or during steganographic process. ~/steganografia/snow$ man bash > contenitore Reformatting bash(1), please wait... ~/steganografia/snow$ man ./snow.1 > secret Reformatting snow.1, please wait... ~/steganografia/snow$ wc -l contenitore secret 4517 contenitore 113 secret 4630 total ~/steganografia/snow$ grep -c " $" contenitore secret contenitore:0 secret:0 ~/steganografia/snow$ ./snow -C -p "antani" -f secret contenitore stegobj Compressed by 35.47% Message used approximately 85.15% of available space. ~/steganografia/snow$ grep -c " $" stegobj 1544 ~/steganografia/snow$ ls -l secret contenitore stegobj -rw-r--r-- 1 vecna vecna 300633 Feb 2 23:53 contenitore -rw-r--r-- 1 vecna vecna 5330 Feb 2 23:54 secret -rw-r--r-- 1 vecna vecna 340474 Feb 2 23:56 stegobj Here we get a 2.9 ratio: it's frequent, but there exist even lower non-steganographic cases. If we were to consider this value suspect, we'd get too much false positives: the system is strong enough because it's based on something the user could have done or not, so the presence of space characters isn't determining for SNOW use. :) This is basically the same concept on which images and music steganograpy was used. Using peripherals that can perceive a higher definition than the human senses (sight/hearing or both) implies that a small part (less significant bits) can eventually be substituted with arbitrary data; data container can then contribute to hiding it or not (an heavy metal band's .mp3 file will be more suitable than a classic piano performance, as a grass field picture versus a portrait). In practical terms this steganographic system would be safe, but as we have already seen these tecniques suffer of the statistic steganalisys (http://www.outguess.org). Using a container for the 90% instead of 10% surely influences analysis possibility. How can we find a good staganograpic container then? - if we hardly can find an injection system that can resist to statistical analysis (using too many manipulation options can create unique cases that can be used as singularity points -- as in case of text fonts, spaces...) - if the frequency of use of a container is proportional to the risk of it being analyzed and found (jpeg and more, outguess) - if the action points are many, that is each point that lets the user modify something can be used (html, CAD?) then a potential solution is to create a software that can work at layer 6, that finds ALL formats/protocols (html, jpeg, gif, html sources with snow, color options, presence and order of particular elements within a page, LSB inside images) that can be found within the media we are using to expose information (e.g. mail or web), and then proceeds ordering all these potential containers and applying mass-steganography to them, dividing content so that it will be used as scarcely as possible. This way each element will contain too few informations to be pinpointed as a single container, but all these informations together will be able to reveal the data hidden inside it once decoded and decompressed. Using different tecniques helps raising the false positive counter for each tecnique, bringing us closer to a strong system (given the limited use of each container) and difficult to attack, given its application (as using 4 data injection systems we'll have to make 4 analisys, and probably we'll have much more false positives). There's no doubt that writing media for these formats gets harder... 4) THIS EXAMPLE REALLY ROCKS: SIMULATING COMMON TRAFFIC While writing this article, I was slightly unsatisfied as I was proposing just a single piece of code on a so wide topic, but it happened to me to run tcpdump and collect some traffic, and to remember that it was months I was receiving packets like these: 13:51:37.806506 127.0.0.1.80 > 62.211.136.80.1575: R [tcp sum ok] 0:0(0) ack 1 win 0 (ttl 121, id 46859, len 40) I had met them before, but once classified as "worm traffic" I had ignored them. http://www.securityfocus.com/archive/75/335132/2003-08-21/2003-08-27/0 http://cert.uni-stuttgart.de/archive/incidents/2003/10/msg00143.html Such a thing can't be ignored! This can be a fantastic method of network steganography, where we could use those fields I forbid myself in the previous example, as these are not linked to any session -- but explained thanks to a common phenomenon. (they shouldn't raise eyebrows, everybody gets them :)) I received 4908 packets matching this rule: 'ip and tcp port 80' and 'tcp[13] & 5 != 0' and 'src 127.0.0.1' from 13:48 to 10:23 of the following day, with an average of a packet each 14 seconds. 3060./tmp$ grep -c "ack 1" VIRUS.traffic.dump 4139 3061./tmp$ grep -vc "ack 1" VIRUS.traffic.dump 769 3062./tmp$ Most of these packets does not have ACK flag set to 1, but to a different value, have different destination ports and the IP ID field is always free for use. This worm's traffic is ok. We can realize a monodirectional system to send information which 'looks like' worm traffic. Given our data (that will be mixed with all others (real) worm packets), we have to find a way to separate things: it seems we need a pre-shared key between two peers (after all, this code is only an example, this afternoon I'll watch Lord of the Rings and it's time to end this article...), so that we can mock up a sort of authentication to the server that will receive our packets. Usable fields are, after a quick analysis over captured traffic: ack_seq (4 byte): completely free id (2 byte): completely free destination port (2 byte): between 1000 and 2000 Once we have agreed on a sub-specie of client authentication we'll be recognized thanks to the destination port (always the same, but not only), while remaining fields will be useful to store our data. Client will work using a raw sock layer and will send packets using IP_HDRINCL, while server will read at datalink layer (given that linux -- with its reverse path protections -- does not allow us to send a packet with the address of an interface to another) and will decipher them this way: - both server and client have a key, that will be expanded as needed to create a byte sequence, that will be used to authanticate packets - first packet is created using IP ID to store the length of the file we are going to transmit, tcp->ack_seq to store first 4 stream bytes and the port, chosen random within common range. Server verifies for each packet if ack_seq is equal to one of its series (server can be waiting more than one session at a time, so it will have more than one expanded serie), and if it is it saves current source port, records file length and opens a file with a suffix indicating that session is incomplete. - Each packet having a recorded destination port and a matching incremental IP ID first byte will be considered valid: second byte of IP ID and 4 bytes of ack_seq will be saved on server's dump file. When the file will reach the expcted length (learned from the first packet), session will be closed and saved, but until then it will keep its 'incomplete' suffix. <-| stego/blaster/Makefile |-> # blaster noise traffic steganography # http://www.s0ftpj.org CC= gcc -O2 -Wall all: blaststegd blaststegsender @echo "remember to read comment and article!" blaststegd: blaststegd.c blaststeg.h $(CC) blaststegd.c -o blaststegd blaststegsender: blaststegsender.c blaststeg.h $(CC) blaststegsender.c -o blaststegsender clean: rm -f blaststegd blaststegsender <-X-> <-| stego/blaster/blaststeg.h |-> #define STREAMSIZE 1024 #define MAXKEYLEN 32 #define EXPANDROUND 6 #define MAXRSTSIZE 40 #define ACTVSTR "(INCOMPLETE)" #define BASEDELAY 14 #define MAXDELAY 15 /* expand the key on the STREAMSIZE with good distribution */ void compute_stream (unsigned char *stream, unsigned char *key) { int j, i, k = 0, klen = strlen (key); memset (stream, 0x00, STREAMSIZE); for (j = 0; j < EXPANDROUND; j++) { if (j) k = (j % klen); for (i = 0; i < STREAMSIZE; i++) { stream[i] = (stream[i] << 4) ^ (key[k] + i); if (key[++k] == 0x00) k = 0; } } } <-X-> <-| stego/blaster/blaststegsender.c |-> /* * Sat Jan 31 10:55:55 CET 2004 * vecna@s0ftpj.org * * example of steganography simulating common (worm) traffic, * * http://www.securityfocus.com/archive/75/335132/2003-08-21/2003-08-27/0 * http://cert.uni-stuttgart.de/archive/incidents/2003/10/msg00143.html * * this code is part of a BFi article, go to www.s0ftpj.org and * get a lot of information about it. * * this is blaststegclient and it has to work with blaststegd * * this client sends anonymous hidden data over tcp reset packet commonly * generated for blaster workaround * * blaststegd could listen for a lot of sessions and make a file for each one * with the session dump * * the session inside the caos is discriminated with a pre-shared key * * protocol to send packets must respect some things: * - the first packet of a session must have tcp->ack matching and ip->id * contains the length of complete data * - the destination port is kept as tracking system, when is matched a * packet with that port it checks if the stream is followed and if it is * then the ack field and one byte of id is kept as incoming data. * each packet contains 5 bytes of data and 3 bytes for session tracking. */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include "blaststeg.h" unsigned int getrand (unsigned int base, unsigned int min, unsigned int max) { unsigned int ret, diff; diff = (max - min); srandom (time (NULL) + diff); if ((ret = random ()) != 0) ret %= diff; return (base + ret); } inline unsigned int half_cksum (const unsigned short *data, int len) { unsigned int sum = 0x00; unsigned short carry = 0x00; while (len > 1) { sum += *data++; len -= 2; } if (len == 1) { *((unsigned short *) &carry) = *(unsigned char *) data; sum += carry; } return sum; } inline unsigned short compute_sum (unsigned int sum) { sum = (sum >> 16) + (sum & 0xffff); sum += (sum >> 16); return (unsigned short) ~sum; } void send_packet (int fd, struct iphdr *ip, struct tcphdr *tcp) { unsigned int sum; struct sockaddr_in sa; sa.sin_addr.s_addr = ip->daddr; sa.sin_port = tcp->dest; sa.sin_family = PF_INET; /* ip check */ ip->check = 0; sum = half_cksum ((unsigned short *) ip, sizeof (struct iphdr)); ip->check = compute_sum (sum); /* tcp check */ tcp->check = 0; sum = half_cksum ((unsigned short *) &ip->saddr, 8); sum += htons (IPPROTO_TCP + sizeof(struct tcphdr)); sum += half_cksum ((unsigned short *) tcp, sizeof (struct tcphdr)); tcp->check = compute_sum (sum); if ((sendto (fd, (void *) ip, MAXRSTSIZE, 0, (struct sockaddr *) &sa, sizeof (sa))) == -1) { printf ("unable to send sock raw packet!\n"); exit (1); } } int main (int argc, char **argv) { unsigned char stream[STREAMSIZE], packet[MAXRSTSIZE]; unsigned int counter = 0, delay, fd, hdrincl = 1; int filelen; struct iphdr *ip = (struct iphdr *) packet; struct tcphdr *tcp = (struct tcphdr *) (packet + sizeof (struct iphdr)); FILE *source; if (argc != 4) { printf ("%s data_file session_key dest_host\n", *argv); exit (1); } printf ("PRIVACY PROTECTION SOFTWARE - example of hiding data on\n" "apparently common worm traffic error - www.s0ftpj.org\n" "check about other information, it could be useful to understand\n" "the limits, the working system and the motivation before run this\n" "steganographic software. coded by vecna@s0ftpj.org\n"); if ((source = fopen (argv[1], "r")) == NULL) { printf ("unable to open file %s\n", argv[1]); exit (1); } fseek (source, 0, SEEK_END); filelen = ftell (source); rewind (source); compute_stream (stream, argv[2]); if ((fd = socket (PF_INET, SOCK_RAW, IPPROTO_TCP)) == -1) { printf ("unable to open socket raw (are you root ?)\n"); exit (1); } setsockopt (fd, SOL_IP, IP_HDRINCL, &hdrincl, sizeof (int)); /* default ip hdr */ memset ((void *) ip, 0x00, sizeof (struct iphdr)); ip->saddr = inet_addr ("127.0.0.1"); ip->daddr = inet_addr (argv[3]); ip->ihl = 5; ip->version = 4; ip->protocol = IPPROTO_TCP; ip->tot_len = htons (MAXRSTSIZE); ip->ttl = (unsigned char) getrand (124, 0, 6); /* default tcp hdr */ memset ((void *) tcp, 0x00, sizeof (struct tcphdr)); tcp->doff = 5; tcp->rst = 1; tcp->ack = 1; tcp->source = htons (80); tcp->dest = htons ((unsigned short) getrand (1024, 0, 920)); /* initialization ip settings */ ip->id = filelen; /* initialization tcp settings */ memcpy ((unsigned char *) &tcp->ack_seq, &stream[counter], 4); counter += 4; send_packet (fd, ip, tcp); while (1) { delay = getrand (BASEDELAY, 0, MAXDELAY); sleep (delay); ip->ttl = (unsigned char) getrand (124, 0, 6); memcpy (&ip->id, &stream[counter], 1); fread ((unsigned char *) &ip->id + 1, 1, 1, source); if (filelen < 4) tcp->ack_seq = getrand (0, 123, 4000000); fread ((void *) &tcp->ack_seq, 4, 1, source); send_packet (fd, ip, tcp); if (++counter == STREAMSIZE) counter = 0; filelen -= 5; if (filelen < 0) break; } printf ("file sent\n"); return 0; } <-X-> <-| stego/blaster/blaststegd.c |-> /* * Sat Jan 31 10:55:55 CET 2004 * vecna@s0ftpj.org * * example of steganography simulating common (worm) traffic, * * http://www.securityfocus.com/archive/75/335132/2003-08-21/2003-08-27/0 * http://cert.uni-stuttgart.de/archive/incidents/2003/10/msg00143.html * * this code is part of a BFi article, go to www.s0ftpj.org and * get a lot of information about this. * * this is blaststegd and it has to work with blaststegsender, * * this blaststegd could listen for a lot of sessions and make a file for each * one with the session dump * * the client sends anonymous hidden data over tcp reset packet commonly * generated for blaster workaround * * the session inside the caos is discriminated with a pre-shared key * the format of file required from blaststegd is a simple list of * pre-shared key. * * compiled under linux, gcc file.c -o output */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include "blaststeg.h" struct session { char *key; unsigned char stream[STREAMSIZE]; unsigned short port; unsigned int counter, length, readbyte; FILE *dump; }; struct session *tracked; int main (int argc, char **argv) { FILE *key; char line[MAXKEYLEN], packet[MAXRSTSIZE], *dump_directory; int sockfd, i = 0, list_sess = 0; if (argc < 2) { printf ("%s pre-shared-password-file \n", *argv); exit (1); } if (argc == 3) dump_directory = argv[2]; else dump_directory = strdup ((const char *)get_current_dir_name ()); printf ( /* BANNER! :) SPAM && INFO */ "KEEP YOUR PRIVACY! this is a free software for communication hiding\n" "blaststeg daemon, anonymous steganographed packets receiver\n" "coded on Sat Jan 31 2004 vecna@s0ftpj.org, http://www.s0ftpj.org\n" "READ ALL ABOUT THIS SOFTWARE, I'm acting like worm-traffic emulation...\n" "if a lot of time is past since the time i coded this it could be obsolete\n" "and insecure!\n" "this software could make some complete/incomplete dump on %s\n" "don't forget about that\n\n", dump_directory); if ((key = fopen (argv[1], "r")) == NULL) { printf ("unable to open file %s\n", argv[1]); exit (1); } do { fgets (line, MAXKEYLEN, key); list_sess++; } while (!feof (key)); if ((tracked = (void *) calloc (list_sess, sizeof (*tracked))) == NULL) { printf ("unable to alloc memory\n"); exit (1); } rewind (key); do { fgets (line, MAXKEYLEN, key); /* strip '\n' */ line[strlen (line) - 1] = 0x00; tracked[i].key = strdup (line); compute_stream (tracked[i].stream, line); i++; } while (!feof (key)); if ((sockfd = socket (PF_PACKET, SOCK_DGRAM, htons(ETH_P_IP))) == -1) { printf ("unable to open datalink layer socket\n"); exit (1); } while (read (sockfd, packet, MAXRSTSIZE) != -1) { static char fname[MAXPATHLEN], newname[MAXPATHLEN]; struct iphdr *ip = (struct iphdr *) packet; struct tcphdr *tcp = (struct tcphdr *) (packet + sizeof (*ip)); if (!tcp->rst || !tcp->ack) continue; if (ip->ihl != 5 || tcp->doff != 5) continue; /* search if is a new session */ for (i = 0; i < list_sess; i++) { if (!memcmp (tracked[i].stream, &tcp->ack_seq, 4)) { printf ("new [%s] session detected\n", tracked[i].key); sprintf (fname, "%s/%s-%u-%s", dump_directory, tracked[i].key, tcp->dest, ACTVSTR ); if (tracked[i].dump != NULL) { printf ("tuncation of incomplete session %s-%u-%s\n", tracked[i].key, tracked[i].port, ACTVSTR); fclose (tracked[i].dump); } if ((tracked[i].dump = fopen (fname, "w+")) == NULL) { printf ("unable to open dump %s\n", fname); exit (0); } tracked[i].port = tcp->dest; tracked[i].length = ip->id; tracked[i].counter =4; break; } /* new session check */ /* match session continuation */ if (tracked[i].port == tcp->dest) { unsigned char check; memcpy (&check, &ip->id, 1); if (check == tracked[i].stream[tracked[i].counter]) { fwrite ((char *) (&ip->id) + 1, 1, 1, tracked[i].dump); fwrite ((char *) &tcp->ack_seq, 1, 4, tracked[i].dump); if (++(tracked[i].counter) == STREAMSIZE) tracked[i].counter = 0; tracked[i].readbyte += 5; if (tracked[i].readbyte < tracked[i].length) break; /* else: session is finished */ sprintf (fname, "%s/%s-%u-%s", dump_directory, tracked[i].key, tracked[i].port, ACTVSTR); sprintf (newname, "%s/%s-%u", dump_directory, tracked[i].key, tracked[i].port); if (tracked[i].readbyte > tracked[i].length) fseek (tracked[i].dump, tracked[i].length, SEEK_SET); /* else: readbyte == length and doesn't required fseek */ fclose (tracked[i].dump); rename (fname, newname); printf ("session closed and saved on %s (%d byte)\n", newname, tracked[i].length); tracked[i].port = 0; tracked[i].dump = NULL; tracked[i].counter = tracked[i].length = tracked[i].readbyte = 0; } } } /* for rolling over tracked[] */ } /* while read */ /* never reached if read don't make error */ printf ("error reading at raw sock layer\n"); exit (1); } <-X-> This is how it works client side: schlafen:blaststeg# man ls > secret Reformatting ls(1), please wait... schlafen:blaststeg# md5sum secret f306648d7e04892e23ed31526e55161d secret schlafen:blaststeg# gzip -9 secret schlafen:blaststeg# ./blaststegsender secret.gz "antanisuperlativo" 192.168.1.1 PRIVACY PROTECTION SOFTWARE - example of hiding data on apparently common worm traffic error - www.s0ftpj.org check about other information, it could be useful to understand the limits, the working system and the motivation before run this steganographic software. coded by vecna@s0ftpj.org file sent schlafen:blaststeg# And this is how it work server side: 511./tmp# ./blaststegd keyfile KEEP YOUR PRIVACY! this is a free software for communication hiding blaststeg daemon, anonymous steganographed packets receiver coded on Sat Jan 31 2004 vecna@s0ftpj.org, http://www.s0ftpj.org READ ALL ABOUT THIS SOFTWARE, I'm acting like worm-traffic emulation... if a lot of time is past since the time i coded this it could be obsolete and insecure! this software could make some complete/incomplete dump on /tmp don't forget about that new [antanisuperlativo] session detected session closed and saved on /tmp/antanisuperlativo-5127 (3141 byte) [1]+ Stopped ./blaststegd keyfile 512./tmp# file antanisuperlativo-5127 antanisuperlativo-5127: gzip compressed data, deflated, original filename, last modified: Tue Feb 3 02:03:17 2004, max compression, os: Unix 513./tmp# mv antanisuperlativo-5127 tmp.gz 514./tmp# gzip -d tmp.gz gzip: tmp.gz: decompression OK, trailing garbage ignored 515./tmp# md5sum tmp f306648d7e04892e23ed31526e55161d tmp 516./tmp# fg ./blaststegd keyfile This is just an example of how important it is to be able to look like something common, inside which we encode information (public: to consider safe a stegosystem, like a cryptosystem, it must not rely on using a secret algorhythm) influenced by a key that permits to extract data only when that key is really present. 5) OUTRO Steganography is applicable on anything, the unique requirement is an implementation. Instead writing entire standalone programs a way of exploiting steganography 'for the masses' could be patching existing software letting them look for a centralized steganographic engine on the host machine: the patch should decode the steganographic stream from the lower-level protocol (e.g. HTTP) and the centralized engine do the rest. Thus, the engine could decode any type of file from any type of stream, eventually resembling the original object when composed by different formats or coming from different streams, or throw the data away. By this way, avoiding indipendent programs as "insert raw data, insert data in format XXXX" and "extra data from format XXXX", less work is required. If someone wants to make a try, you can try sending me an e-mail :) At last, i would like to add that after receiving comments from the article pre-release: - innova could be considered stuck at version 0.0.1, instead of 0.0.2 - when planning a new steganographic protocol, it is important to choose where to hide data in the original stream, but it is even more important thinking about "how much different is my communication from a standard one?" possible answers to this question may be: "a lot" - which is VERY BAD "not so much, but some statistic analysis can easily reveal the difference" - meaning that the software is MERELY USABLE "it is not possible to distinguish them in any way" - it's VERY GOOD. - this article talks mainly about two things: the first is usage of steganography in network communications and the other is a general excursus about using the same options an user may play with in standard activity as parameters for a steganographic channel. Usage of bold and font variations is a border-line example, but it is not to be undervaluated. Any user-defineable element can be used as a steganographic transport container. If the abuse of one of those system could introduce a weakness in the system itself, many system could be used simultaneously, thus sharing the load and risk of statistical analisys. - Any steganographic system can be applied to extract data from any underlying stream. If I combine bold, font variations and extraction of lower bits of JPG images in a web page and if I defend myself by using only 10% of each element, I can apply the reversed procedure to ALL web pages and what I obtain MUST seem uniformly random, because the user's entropy is the best place where to hide. Raw data that must be hid should be preprocessed to be as much similar as possible to data that can be extracted from non-stegranophic pages, expecially regarding statistical distributions (like in plain text). The simpler and may be most effective manipulation is compression and, if desired, encrpytion. (Obviously, specific headers like gzip and ACE's one must be removed) By this way, bruteforce attacks to reach steganographed data are very likely to fail. - In my opinion, steganography can't be stopped. In any way. Even in a scenario having packet filtering, trusted system and software, level 5 content filtering and so on, it could be possible to securely communicate. Let's only imagine to use macros in famous word processing software, which may be implemented in crazy programming languages: even in this terrificant hyptohetic informatics scenario it could be possible to hack, create or generate documents that could have more content than the simple appereance using steganography. Now, escaping this prohibitive situation, by sending them through e-mail or publishing them on public forums it could be possible to make hidden communications. - Using higher layer protocols as a trasmission media involves an higher amount of possibilities where to hide data, thus any type of analysis becomes harder: * a lot of formats * a lot of possible options requiring dedicated analisys * evolutions of protocols and formats through time This hightens even more the amounts of false positives Basically, the more the underlying system is complex, the more it will be simple to find places where to hide safely. The l-user won't have a predictable behaviour, never and never. 6) THANKS CCP, Guccini, Cardigans, Latte e i suoi derivati, Scisma, Vivaldi and Radio Cybernet. Acaso and MD for letting me escape from shell's labyrinths Metro Olografix crypto meeting (http://www.olografix.org) and e-privacy (http://e-privacy.firenze.linux.it) for their contribute in keeping my interest for these topics alive. Smaster for logs suitable for the analisys, which could have been remained an idea, instead. If you are looking for the trascendental illumination in malabyte: http://3564020356.org/ Zeist! The first person who read the pre-release and even understood it :) odo, that in extremis remembered me one of the best software about steganograpy which is transparent, simple and written in Italian :) (http://www.autistici.org/bakunin/) (mod_stego) which also respects HTML pages' rendering for the browser but injects data at the same time. and even if I _do_ support freedom of choice: _ALWAYS_ ENCRYPT YOUR E-MAIL! http://www.gnupg.org/ -[ WEB ]---------------------------------------------------------------------- http://bfi.s0ftpj.org [main site - IT] http://bfi.cx [mirror - IT] http://bfi.freaknet.org [mirror - AT] http://bfi.anomalistic.org [mirror - SG] http://bfi.slackit.org [mirror - DE] -[ E-MAiL ]------------------------------------------------------------------- bfi@s0ftpj.org -[ PGP ]---------------------------------------------------------------------- -----BEGIN PGP PUBLIC KEY BLOCK----- Version: 2.6.3i mQENAzZsSu8AAAEIAM5FrActPz32W1AbxJ/LDG7bB371rhB1aG7/AzDEkXH67nni DrMRyP+0u4tCTGizOGof0s/YDm2hH4jh+aGO9djJBzIEU8p1dvY677uw6oVCM374 nkjbyDjvBeuJVooKo+J6yGZuUq7jVgBKsR0uklfe5/0TUXsVva9b1pBfxqynK5OO lQGJuq7g79jTSTqsa0mbFFxAlFq5GZmL+fnZdjWGI0c2pZrz+Tdj2+Ic3dl9dWax iuy9Bp4Bq+H0mpCmnvwTMVdS2c+99s9unfnbzGvO6KqiwZzIWU9pQeK+v7W6vPa3 TbGHwwH4iaAWQH0mm7v+KdpMzqUPucgvfugfx+kABRO0FUJmSTk4IDxiZmk5OEB1 c2EubmV0PokBFQMFEDZsSu+5yC9+6B/H6QEBb6EIAMRP40T7m4Y1arNkj5enWC/b a6M4oog42xr9UHOd8X2cOBBNB8qTe+dhBIhPX0fDJnnCr0WuEQ+eiw0YHJKyk5ql GB/UkRH/hR4IpA0alUUjEYjTqL5HZmW9phMA9xiTAqoNhmXaIh7MVaYmcxhXwoOo WYOaYoklxxA5qZxOwIXRxlmaN48SKsQuPrSrHwTdKxd+qB7QDU83h8nQ7dB4MAse gDvMUdspekxAX8XBikXLvVuT0ai4xd8o8owWNR5fQAsNkbrdjOUWrOs0dbFx2K9J l3XqeKl3XEgLvVG8JyhloKl65h9rUyw6Ek5hvb5ROuyS/lAGGWvxv2YJrN8ABLo= =o7CG -----END PGP PUBLIC KEY BLOCK----- ============================================================================== -----------------------------------[ EOF ]------------------------------------ ==============================================================================