Convert Bytes to String in Python: A Tutorial for Rookies


Picture by Writer

 

In Python, strings are immutable sequences of characters which are human-readable and usually encoded in a selected character encoding, similar to UTF-8. Whereas bytes characterize uncooked binary information. A byte object is immutable and consists of an array of bytes (8-bit values). In Python 3, string literals are Unicode by default, whereas byte literals are prefixed with a b.

Changing bytes to strings is a typical job in Python, notably when working with information from community operations, file I/O, or responses from sure APIs. It is a tutorial on the right way to convert bytes to strings in Python.

 

1. Convert Bytes to String Utilizing the decode() Methodology

 

Probably the most simple option to convert bytes to a string is utilizing the decode() technique on the byte object (or the byte string). This technique requires specifying the character encoding used.

Be aware: Strings do not need an related binary encoding and bytes do not need an related textual content encoding. To transform bytes to string, you should utilize the decode() technique on the bytes object. And to transform string to bytes, you should utilize the encode() technique on the string. In both case, specify the encoding for use.

Instance 1: UTF-8 Encoding

Right here we convert byte_data to a UTF-8-encoded string utilizing the decode() technique:

# Pattern byte object
byte_data = b'Hiya, World!'

# Changing bytes to string 
string_data = byte_data.decode('utf-8')

print(string_data)  

 

You must get the next output:

 

You may confirm the information varieties earlier than and after the conversion like so:

print(kind(bytes_data))
print(kind(string_data))

 

The info varieties must be as anticipated:

Output >>>
<class 'bytes'>
<class 'str'>

 

Instance 2: Dealing with Different Encodings

Generally, the bytes sequence might comprise encodings apart from UTF-8. You may deal with this by specifying the corresponding encoding scheme used whenever you name the decode() technique on the bytes object.

Right here’s how one can decode a byte string with UTF-16 encoding:

# Pattern byte object 
byte_data_utf16 = b'xffxfeHx00ex00lx00lx00ox00,x00 x00Wx00ox00rx00lx00dx00!x00'

# Changing bytes to string 
string_data_utf16 = byte_data_utf16.decode('utf-16')

print(string_data_utf16)  

 

And right here’s the output:

 

Utilizing Chardet to Detect Encoding

In observe, it’s possible you’ll not all the time know the encoding scheme used. And mismatched encodings can result in errors or garbled textual content. So how do you get round this?

You need to use the chardet library (set up chardet utilizing pip: pip set up chardet) to detect the encoding. After which use it within the `decode()` technique name. Right here’s an instance:

import chardet

# Pattern byte object with unknown encoding
byte_data_unknown = b'xe4xbdxa0xe5xa5xbd'

# Detecting the encoding
detected_encoding = chardet.detect(byte_data_unknown)
encoding = detected_encoding['encoding']
print(encoding)

# Changing bytes to string utilizing detected encoding
string_data_unknown = byte_data_unknown.decode(encoding)

print(string_data_unknown) 

 

You must get an identical output:

 

Error Dealing with in Decoding

 

The bytes object that you simply’re working with might not all the time be legitimate; it might typically comprise invalid sequences for the desired encoding. This may result in errors.

Right here, byte_data_invalid accommodates the invalid sequence xff:

# Pattern byte object with invalid sequence for UTF-8
byte_data_invalid = b'Hiya, World!xff'

# attempt changing bytes to string 
string_data = byte_data_invalid.decode('utf-8')

print(string_data) 

 

If you attempt to decode it, you’ll get the next error:

Traceback (most up-to-date name final):
  File "/home/balapriya/bytes2str/main.py", line 5, in 
	string_data = byte_data_invalid.decode('utf-8')
              	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec cannot decode byte 0xff in place 13: invalid begin byte

 

However there are a few methods you possibly can deal with these errors. You may ignore such errors when decoding or you possibly can change invalid sequences with a placeholder.

 

Ignoring Errors

To disregard invalid sequences when decoding, you possibly can set the errors you possibly can set errors to ignore within the decode() technique name:

# Pattern byte object with invalid sequence for UTF-8
byte_data_invalid = b'Hiya, World!xff'

# Changing bytes to string whereas ignoring errors
string_data = byte_data_invalid.decode('utf-8', errors="ignore")

print(string_data) 

 

You’ll now get the next output with none errors:

 

Changing Errors

You may as properly change invalid sequences with the placeholder. To do that, you possibly can set errors to change as proven:

# Pattern byte object with invalid sequence for UTF-8
byte_data_invalid = b'Hiya, World!xff'

# Changing bytes to string whereas changing errors with a placeholder
string_data_replace = byte_data_invalid.decode('utf-8', errors="replace")

print(string_data_replace)  

 

Now the invalid sequence (on the finish) is changed by a placeholder:

Output >>>
Hiya, World!�

 

2. Convert Bytes to String Utilizing the str() Constructor

 

The decode() technique is the commonest option to convert bytes to string. However you can too use the str() constructor to get a string from a bytes object. You may move within the encoding scheme to str() like so:

# Pattern byte object
byte_data = b'Hiya, World!'

# Changing bytes to string
string_data = str(byte_data,'utf-8')

print(string_data)

 

This outputs:

 

3. Convert Bytes to String Utilizing the Codecs Module

 

One more technique to transform bytes to string in Python is utilizing the decode() operate from the built-in codecs module. This module supplies comfort features for encoding and decoding.

You may name the decode() operate with the bytes object and the encoding scheme as proven:

import codecs

# Pattern byte object
byte_data = b'Hiya, World!'

# Changing bytes to string
string_data = codecs.decode(byte_data,'utf-8')

print(string_data)  

 

As anticipated, this additionally outputs:

 

Abstract

 

On this tutorial, we realized the right way to convert bytes to strings in Python whereas additionally dealing with completely different encodings and potential errors gracefully. Particularly, we realized the right way to:

  • Use the decode() technique to transform bytes to a string, specifying the proper encoding.
  • Deal with potential decoding errors utilizing the errors parameter with choices like ignore or change.
  • Use the str() constructor to transform a sound bytes object to a string.
  • Use the decode() operate from the codecs module that’s constructed into the Python normal library to transform a sound bytes object to a string.

Completely happy coding!

 

 

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, information science, and content material creation. Her areas of curiosity and experience embrace DevOps, information science, and pure language processing. She enjoys studying, writing, coding, and occasional! Presently, she’s engaged on studying and sharing her data with the developer group by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates participating useful resource overviews and coding tutorials.

Recent articles

Researchers Warn of Privilege Escalation Dangers in Google’s Vertex AI ML Platform

Nov 15, 2024Ravie LakshmananSynthetic Intelligence / Vulnerability Cybersecurity researchers have...

How AI Is Reworking IAM and Id Safety

Lately, synthetic intelligence (AI) has begun revolutionizing Id Entry...

Vietnamese Hacker Group Deploys New PXA Stealer Focusing on Europe and Asia

Nov 15, 2024Ravie LakshmananMalware / Credential Theft A Vietnamese-speaking risk...