Remove BOM (Byte Order Mark) ÿþ unicdoe 65279 character

A lot of programmer folks face the irritating problems because of Byte Order Mark (or BOM). They want to remove bom character but it adamantly stay put in their files like HTML, XML, ASCII text. Upon investigation, programmers find that they need to remove ÿþ Unicode 65279 character to get rid of extra space or newline in their files. Let’s examine the issue and see how to remove bom, the byte order mark.

What is Byte Order Mark?

Byte Order Mark (or BOM) is a signal that tells the computer how the bytes are ordered in a Unicode document. Because Unicode can be used in the formats of 8, 16 and 32 bits –it is important for the computer to understand which encoding has been used in the Unicode document. BOM tells exactly the same to the computer.

BOM is actually a “zero-width non-breaking space” (practically a NULL character) and it is represented as U+FEFF

Unicode Encoding	In ISO-8859-1 BOM appears as
UTF-8	`ï»¿`
UTF-16 (big endian)	`þÿ`
UTF-16 (little endian)	`ÿþ`
UTF-32 (big endian)	`□□þÿ` (□ is the ASCII null character)
UTF-32 (little endian)	`ÿþ□□` (□ is the ASCII null character)

In HTML code the BOM character can also appear as 

Remove BOM from an XML file

Just open the file in vim text editor use the “nobomb” command

# vim file.xml
:set nobomb
:wq

Removal from HTML Files

When faced with the bom character problem, many webpage developers try setting encoding of their page to “charset=utf-8” through meta property. But doing this does not mean that you will not face the BOM problem. If a BOM character is causing problems in your HTML display -the problem actually lies in the text editor and not in your HTML/CSS code.

Most HTML editors, like Dreamweaver, Programmer’s Notepad, TextPad etc., do provide a way to disable BOM. The option usually appears in the place where you set the encoding of your text editor. It may appear as options like “UTF-8 without BOM” or “UTF-8 No BOM”.

Appearance of  character in your HTML code can also be solved using the above encoding change in HTML editor. Just set the encoding without BOM and then save the file.

Setting UTF without BOM character in Macromedia Dreamweaver

Setting UTF without BOM in Programmer’s Notepad

Detection and Removal of BOM in Linux

Linux commands make it easier to find BOM character and then remove it from files. Powerful Linux tools like grep and shell programming make it a cakewalk. Here is how we can do it:

Find the list of files containing BOM characters

find /var/www/website/ -type f -print  -exec hd -n 3 {} \;  | grep -1 &amp;quot;ef bb bf&amp;quot; | grep &amp;quot;some_part_of_the_path&amp;quot; &amp;amp;gt; bom_lines.txt

Remove BOM character

while read l; do sed -i '1 s/^\xef\xbb\xbf//'   $l ; done &amp;amp;lt; bom_lines.txt

So this was it! This is how you can remove bom character from your program/text file. I decided to write this article because I had to waste two hours in learning how to remove the nuisance of ÿþ Unicode 65279 character. Once I learned it, I thought it should be documented so that other programmers can save some time!

I hope it was useful for you. Thank you for using TechWelkin.

Gregory Alter says:

October 27, 2022 at 8:48 am

Hi, I just want to thank you so much in aiding my detective work on figuring out what ÿþ means! Although I am still not sure that the BOM is making it impossible for my TXT file to process properly in order for me to submit Electronic Date Interchange EDI files for getting paid for my work, there is a good chance that is my problem and you have offered the best help yet, in my search for a way to get paid for my work! I purchased Sublime for $90 and dl’d VIM and can’t figure out how to get either of them to rid my files of the BOM. You mentioned the Programmer’s Notepad, where I can simply load a file, set the property to turn BOM off, and maybe it will work! If you see this message I am curious about why I cannot see the BOM in the text file, itself, and why some of these editors make it so difficult to eliminate the BOM. I, for example, was converting from PDF to Plain Text and trying to preserve the layout to be visually the same in the text file as the PDF. The particular program, PDF Escape, which costs me a licensing fee, offers no options for the Convert to Text feature, so I had no idea there was a problem until I submitted a test file for processing, and received only one hint in return: ÿþ. I loaded these files into MS Word, selected various encoding schemes, all without the simple option of NOBOM! OMG! Anyway, thanks for your assistance and any comments you can offer.

Luciano Fernandes says:

April 6, 2017 at 10:54 pm

I did everything and nothing works. :(

srikesavan says:

March 11, 2016 at 10:57 am

Hi Lalit what about jquery?

Lalit Kumar says:

March 11, 2016 at 10:58 am

Please elaborate your question Srikesavan.

- srikesavan says:
  
  March 11, 2016 at 11:09 am
  
  I tried this for jQuery “replace(/([\x00-\x7F])|./g, “$1″);”
  But i need to remove “&#65279” this type of characters only. No need to remove the blank space also Lalit.