A lot of programmer folks face the irritating problems because of Byte Order Mark (or BOM). They want to remove bom character but it adamantly stay put in their files like HTML, XML, ASCII text. Upon investigation, programmers find that they need to remove ÿþ Unicode 65279 character to get rid of extra space or newline in their files. Let’s examine the issue and see how to remove bom, the byte order mark.
Byte Order Mark (or BOM) is a signal that tells the computer how the bytes are ordered in a Unicode document. Because Unicode can be used in the formats of 8, 16 and 32 bits –it is important for the computer to understand which encoding has been used in the Unicode document. BOM tells exactly the same to the computer.
BOM is actually a “zero-width non-breaking space” (practically a NULL character) and it is represented as U+FEFF
Unicode Encoding |
In ISO-8859-1 BOM appears as |
UTF-8 |  |
UTF-16 (big endian) |
þÿ |
UTF-16 (little endian) |
ÿþ |
UTF-32 (big endian) |
□□þÿ (□ is the ASCII null character) |
UTF-32 (little endian) |
ÿþ□□ (□ is the ASCII null character) |
In HTML code the BOM character can also appear as 
Just open the file in vim text editor use the “nobomb” command
# vim file.xml :set nobomb :wq
When faced with the bom character problem, many webpage developers try setting encoding of their page to “charset=utf-8” through meta property. But doing this does not mean that you will not face the BOM problem. If a BOM character is causing problems in your HTML display -the problem actually lies in the text editor and not in your HTML/CSS code.
Most HTML editors, like Dreamweaver, Programmer’s Notepad, TextPad etc., do provide a way to disable BOM. The option usually appears in the place where you set the encoding of your text editor. It may appear as options like “UTF-8 without BOM” or “UTF-8 No BOM”.
Appearance of  character in your HTML code can also be solved using the above encoding change in HTML editor. Just set the encoding without BOM and then save the file.
Linux commands make it easier to find BOM character and then remove it from files. Powerful Linux tools like grep and shell programming make it a cakewalk. Here is how we can do it:
Find the list of files containing BOM characters
find /var/www/website/ -type f -print -exec hd -n 3 {} \; | grep -1 "ef bb bf" | grep "some_part_of_the_path" > bom_lines.txt
Remove BOM character
while read l; do sed -i '1 s/^\xef\xbb\xbf//' $l ; done < bom_lines.txt
So this was it! This is how you can remove bom character from your program/text file. I decided to write this article because I had to waste two hours in learning how to remove the nuisance of ÿþ Unicode 65279 character. Once I learned it, I thought it should be documented so that other programmers can save some time!
I hope it was useful for you. Thank you for using TechWelkin.
Hi, I just want to thank you so much in aiding my detective work on figuring out what ÿþ means! Although I am still not sure that the BOM is making it impossible for my TXT file to process properly in order for me to submit Electronic Date Interchange EDI files for getting paid for my work, there is a good chance that is my problem and you have offered the best help yet, in my search for a way to get paid for my work! I purchased Sublime for $90 and dl’d VIM and can’t figure out how to get either of them to rid my files of the BOM. You mentioned the Programmer’s Notepad, where I can simply load a file, set the property to turn BOM off, and maybe it will work! If you see this message I am curious about why I cannot see the BOM in the text file, itself, and why some of these editors make it so difficult to eliminate the BOM. I, for example, was converting from PDF to Plain Text and trying to preserve the layout to be visually the same in the text file as the PDF. The particular program, PDF Escape, which costs me a licensing fee, offers no options for the Convert to Text feature, so I had no idea there was a problem until I submitted a test file for processing, and received only one hint in return: ÿþ. I loaded these files into MS Word, selected various encoding schemes, all without the simple option of NOBOM! OMG! Anyway, thanks for your assistance and any comments you can offer.
I did everything and nothing works. :(
Hi Lalit what about jquery?
Please elaborate your question Srikesavan.
I tried this for jQuery “replace(/([\x00-\x7F])|./g, “$1″);”
But i need to remove “” this type of characters only. No need to remove the blank space also Lalit.
Thanks. It worked for me using vim.
You are awesome!
Solved the problem of  within seconds.. After struggling for hours on my own..
Hi Hari, I am happy I could help you :-) Hope to hear more from you on TechWelkin! Stay connected!
Saved me a ton of time. Thanks for the detailed article!
In your page code highlighter is showing
as >